Lecture 8

Maps

Choropleth maps show geographical areas or regions that are coloured, shaded or patterned based on the data.

We see this often during election days.

Each of these maps shows data for the same event, but the impressions they convey are very different. Each faces two main problems.

  1. The underlying quantities of interest are only partly spatial.
  • Can we disaggregate information further?
  1. The regions themselves are of wildly differing sizes, and they differ in a way that is not well-correlated with the magnitudes of the underlying votes.

1. Electoral data in U.S.

# Try to comment each line 
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6     ✔ purrr   0.3.4
## ✔ tibble  3.1.8     ✔ dplyr   1.0.9
## ✔ tidyr   1.2.0     ✔ stringr 1.4.0
## ✔ readr   2.1.2     ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(jtools)

setwd("~/Documents/ibs_course/BUS240/data")
load('election.rda')

names(election)
##  [1] "state"        "st"           "fips"         "total_vote"   "vote_margin" 
##  [6] "winner"       "party"        "pct_margin"   "r_points"     "d_points"    
## [11] "pct_clinton"  "pct_trump"    "pct_johnson"  "pct_other"    "clinton_vote"
## [16] "trump_vote"   "johnson_vote" "other_vote"   "ev_dem"       "ev_rep"      
## [21] "ev_oth"       "census"
election %>% select(state, total_vote, r_points, pct_trump, party, census) %>%
  sample_n(7)
## # A tibble: 7 × 6
##   state                total_vote r_points pct_trump party      census   
##   <chr>                     <dbl>    <dbl>     <dbl> <chr>      <chr>    
## 1 Connecticut             1644920  -13.6       40.9  Democratic Northeast
## 2 Alaska                   318608   14.7       51.3  Republican West     
## 3 West Virginia            721233   41.7       67.8  Republican South    
## 4 District of Columbia     311268  -86.8        4.09 Democratic South    
## 5 Wisconsin               2976150    0.770     47.2  Republican Midwest  
## 6 Hawaii                   428937  -32.2       30.0  Democratic West     
## 7 Pennsylvania            6166729    0.710     48.2  Republican Northeast

State-level vote totals and shares for the 2016 US Presidential election. The variables are as follows:

state. State name
total_vote. Total votes cast
r_points. Percentage point difference between Trump share and Clinton
pct_trump. Trump vote share 
party. Winning party
census. Census region

Let’s recall the Cleveland plot

party_colors <- c("#2E74C0", "#CB454A") 

# Try to comment each line 
p0 <- ggplot(data = subset(election, st %nin% "DC"),
             mapping = aes(x = r_points,
                           y = reorder(state, r_points),
                           color = party))
p1 <- p0 + geom_vline(xintercept = 0, color = "gray30") + geom_point(size = 2)
p2 <- p1 + scale_color_manual(values = party_colors)
p3 <- p2 + scale_x_continuous(breaks = c(-30, -20, -10, 0, 10, 20, 30, 40),
                              labels = c("30\n (Clinton)", "20", "10", "0",
                                         "10", "20", "30", "40\n(Trump)"))
p3 + facet_wrap(~ census, ncol=2, scales="free_y") + 
     guides(color="none") + labs(x = "Point Margin", y = "") +
     theme(axis.text=element_text(size=8))

Note that you don’t have to represent spatial data spatially. Of course, spatial representations can be beneficial and look quite professional. We need to think hard about what we’d gain from the spatial representations. It’d lose information a lot.

2. Maps

The first task in drawing a map is to get a data frame with the right information in it, and in the right order. We use maps library which provides some pre-drawn map data.

library(maps)
## 
## Attaching package: 'maps'
## The following object is masked from 'package:purrr':
## 
##     map
us_states <- map_data("state")
head(us_states)
##        long      lat group order  region subregion
## 1 -87.46201 30.38968     1     1 alabama      <NA>
## 2 -87.48493 30.37249     1     2 alabama      <NA>
## 3 -87.52503 30.37249     1     3 alabama      <NA>
## 4 -87.53076 30.33239     1     4 alabama      <NA>
## 5 -87.57087 30.32665     1     5 alabama      <NA>
## 6 -87.58806 30.32665     1     6 alabama      <NA>
glimpse(us_states)
## Rows: 15,537
## Columns: 6
## $ long      <dbl> -87.46201, -87.48493, -87.52503, -87.53076, -87.57087, -87.5…
## $ lat       <dbl> 30.38968, 30.37249, 30.37249, 30.33239, 30.32665, 30.32665, …
## $ group     <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ order     <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 1…
## $ region    <chr> "alabama", "alabama", "alabama", "alabama", "alabama", "alab…
## $ subregion <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …

We can make a blank state map right away with this data, using geom_polygon()

p <- ggplot(data = us_states,
            mapping = aes(x = long, y = lat, group = group))

p + geom_polygon(fill = "white", color = "black")

The map is plotted with latitude and longitude points, which are there as scale elements mapped to the x and y axes. A map is, after all, just a set of lines drawn in the right order on a grid.

p <- ggplot(data = us_states,
            aes(x = long, y = lat,
                group = group, fill = region))
p + geom_polygon(color = "gray90") 

Tell R not to plot a legend.

p <- ggplot(data = us_states,
            aes(x = long, y = lat,
                group = group, fill = region))
p + geom_polygon(color = "gray90") + guides(fill = 'none') 

Thin the lines to make the state borders

p <- ggplot(data = us_states,
            aes(x = long, y = lat,
                group = group, fill = region))
p + geom_polygon(color = "gray90", size = .1) + guides(fill = 'none') 

Techniques for map projection are a fascinating world of their own, but for now, just remember we can transform the default projection used by geom_polygon(), via the coord_map() function. You’ll remember that we said that projection onto a coordinate system is a necessary part of the plotting process for any data.

Normally it is left implicit, and we use Mercator projection, which looks like a 2 X 2 panel.

The Albers projection requires two latitude parameters, lat0 and lat1. We give them their conventional values for a US map here.

# play around lat0 and lat1 values
p + geom_polygon(color = "gray90", size = 0.1) +
    coord_map(projection = "albers", lat0 = 39, lat1 = 45) +
    guides(fill = "none")

So what are we doing? We have a state-level map.

Now we need to get our own data on to the map. Remember, underneath that map is just a big data frame specifying a large number of lines that need to be drawn. We have to merge our data with that data frame. Let’s go back to election data

head(election)
## # A tibble: 6 × 22
##   state st     fips total…¹ vote_…² winner party pct_m…³ r_poi…⁴ d_poi…⁵ pct_c…⁶
##   <chr> <chr> <dbl>   <dbl>   <dbl> <chr>  <chr>   <dbl>   <dbl>   <dbl>   <dbl>
## 1 Alab… AL        1  2.12e6  588708 Trump  Repu…  0.277    27.7   -27.7     34.4
## 2 Alas… AK        2  3.19e5   46933 Trump  Repu…  0.147    14.7   -14.7     36.6
## 3 Ariz… AZ        4  2.60e6   91234 Trump  Repu…  0.035     3.5    -3.5     44.6
## 4 Arka… AR        5  1.13e6  304378 Trump  Repu…  0.269    26.9   -26.9     33.6
## 5 Cali… CA        6  1.42e7 4269978 Clint… Demo…  0.300   -30.0    30.0     61.5
## 6 Colo… CO        8  2.78e6  136386 Clint… Demo…  0.0491   -4.91    4.91    48.2
## # … with 11 more variables: pct_trump <dbl>, pct_johnson <dbl>,
## #   pct_other <dbl>, clinton_vote <dbl>, trump_vote <dbl>, johnson_vote <dbl>,
## #   other_vote <dbl>, ev_dem <dbl>, ev_rep <dbl>, ev_oth <dbl>, census <chr>,
## #   and abbreviated variable names ¹​total_vote, ²​vote_margin, ³​pct_margin,
## #   ⁴​r_points, ⁵​d_points, ⁶​pct_clinton
## # ℹ Use `colnames()` to see all variable names
print("See the values in the state column")
## [1] "See the values in the state column"
head(election$state)
## [1] "Alabama"    "Alaska"     "Arizona"    "Arkansas"   "California"
## [6] "Colorado"

Interpret the following codes:

election$region <- tolower(election$state)
us_states_elec <- left_join(us_states, election)
## Joining, by = "region"

Now we have a merged data.

head(us_states_elec)
##        long      lat group order  region subregion   state st fips total_vote
## 1 -87.46201 30.38968     1     1 alabama      <NA> Alabama AL    1    2123372
## 2 -87.48493 30.37249     1     2 alabama      <NA> Alabama AL    1    2123372
## 3 -87.52503 30.37249     1     3 alabama      <NA> Alabama AL    1    2123372
## 4 -87.53076 30.33239     1     4 alabama      <NA> Alabama AL    1    2123372
## 5 -87.57087 30.32665     1     5 alabama      <NA> Alabama AL    1    2123372
## 6 -87.58806 30.32665     1     6 alabama      <NA> Alabama AL    1    2123372
##   vote_margin winner      party pct_margin r_points d_points pct_clinton
## 1      588708  Trump Republican     0.2773    27.72   -27.72       34.36
## 2      588708  Trump Republican     0.2773    27.72   -27.72       34.36
## 3      588708  Trump Republican     0.2773    27.72   -27.72       34.36
## 4      588708  Trump Republican     0.2773    27.72   -27.72       34.36
## 5      588708  Trump Republican     0.2773    27.72   -27.72       34.36
## 6      588708  Trump Republican     0.2773    27.72   -27.72       34.36
##   pct_trump pct_johnson pct_other clinton_vote trump_vote johnson_vote
## 1     62.08        2.09      1.46       729547    1318255        44467
## 2     62.08        2.09      1.46       729547    1318255        44467
## 3     62.08        2.09      1.46       729547    1318255        44467
## 4     62.08        2.09      1.46       729547    1318255        44467
## 5     62.08        2.09      1.46       729547    1318255        44467
## 6     62.08        2.09      1.46       729547    1318255        44467
##   other_vote ev_dem ev_rep ev_oth census
## 1      31103      0      9      0  South
## 2      31103      0      9      0  South
## 3      31103      0      9      0  South
## 4      31103      0      9      0  South
## 5      31103      0      9      0  South
## 6      31103      0      9      0  South

Now that everything is in one big data frame, we can plot it in a map.

p <- ggplot(data = us_states_elec,
            aes(x = long, y = lat, group = group, fill = party))

p + geom_polygon(color = "gray90", size = 0.1) +
    coord_map(projection = "albers", lat0 = 39, lat1 = 45) 

Add some details:

p0 <- ggplot(data = us_states_elec,
             mapping = aes(x = long, y = lat, group = group, fill = party))
p1 <- p0 + geom_polygon(color = "gray90", size = 0.1) +
      coord_map(projection = "albers", lat0 = 39, lat1 = 45) 
p2 <- p1 + scale_fill_manual(values = party_colors) +
      labs(title = "Election Results 2016", fill = "none")
library(ggthemes)
p2 + theme_map() 

With the map data frame in place, we can map other variables. Let’s try a continuous measure, such as the percentage of the vote received by Donald Trump, pct_trump

p0 <- ggplot(data = us_states_elec,
             mapping = aes(x = long, y = lat, group = group, fill = pct_trump))
# The default color used in the p1 object is blue.
p1 <- p0 + geom_polygon(color = "gray90", size = 0.1) +
      coord_map(projection = "albers", lat0 = 39, lat1 = 45) 

p1 + labs(title = "Trump vote") + theme_map() + labs(fill = "Percent")

Want to change the direction of gradation

p2 <- p1 + scale_fill_gradient(low = "white", high = "#CB454A") +
      labs(title = "Trump vote") 
p2 + theme_map() + labs(fill = "Percent")

For election results, we might prefer a gradient that diverges from a midpoint. The scale_gradient2() function gives us a blue-red spectrum that passes through white by default.

p0 <- ggplot(data = us_states_elec,
             mapping = aes(x = long, y = lat, group = group, fill = d_points))

p1 <- p0 + geom_polygon(color = "gray90", size = 0.1) +
  coord_map(projection = "albers", lat0 = 39, lat1 = 45) 

p2 <- p1 + scale_fill_gradient2() + labs(title = "Winning margins") 
p2 + theme_map() + labs(fill = "Percent")

Purple America map

p3 <- p1 + scale_fill_gradient2(low = "red", mid = scales::muted("purple"),
                                high = "blue", breaks = c(-25, 0, 25, 50, 75)) +
    labs(title = "Winning margins") 
p3 + theme_map() + labs(fill = "Percent")

Why does it look so blueish? This is because Washington DC is included in the data, and hence the scale. Even though it is barely visible on the map, DC has by far the highest points margin in favor of the Democrats of any unit of observation in the data. If we omit it, we’ll see that our scale shifts in a way that does not just affect the top of the blue end, but re-centers the whole gradient and makes the red side more vivid as a result.

p0 <- ggplot(data = subset(us_states_elec,
                           region %nin% "district of columbia"),
             aes(x = long, y = lat, group = group, fill = d_points))

p1 <- p0 + geom_polygon(color = "gray90", size = 0.1) +
      coord_map(projection = "albers", lat0 = 39, lat1 = 45) 

p2 <- p1 + scale_fill_gradient2(low = "red",
                                mid = scales::muted("purple"),
                                high = "blue") +
    labs(title = "Winning margins") 
p2 + theme_map() + labs(fill = "Percent")

3. Going County level

setwd("~/Documents/ibs_course/BUS240/data")
load('county_map.rda')
load('county_data.rda')
glimpse(county_map)
## Rows: 191,382
## Columns: 7
## $ long  <dbl> 1225889, 1235324, 1244873, 1244129, 1272010, 1276797, 1273832, 1…
## $ lat   <dbl> -1275020, -1274008, -1272331, -1267515, -1262889, -1295514, -129…
## $ order <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 1…
## $ hole  <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, F…
## $ piece <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ group <fct> 0500000US01001.1, 0500000US01001.1, 0500000US01001.1, 0500000US0…
## $ id    <chr> "01001", "01001", "01001", "01001", "01001", "01001", "01001", "…
county_map %>% sample_n(5)
##        long        lat  order  hole piece            group    id
## 1 1667128.6  -524167.3 183694 FALSE     1 0500000US54067.1 54067
## 2 1565944.7  -658729.0  68984 FALSE     1 0500000US21195.1 21195
## 3  101985.6 -1433176.3 157837 FALSE     1 0500000US48093.1 48093
## 4  548620.5 -1126258.7  23253 FALSE     1 0500000US05127.1 05127
## 5 1962495.9  -533602.4 172435 FALSE     1 0500000US51085.1 51085
glimpse(county_data)
## Rows: 3,195
## Columns: 32
## $ id               <chr> "0", "01000", "01001", "01003", "01005", "01007", "01…
## $ name             <chr> NA, "1", "Autauga County", "Baldwin County", "Barbour…
## $ state            <fct> NA, AL, AL, AL, AL, AL, AL, AL, AL, AL, AL, AL, AL, A…
## $ census_region    <fct> NA, South, South, South, South, South, South, South, …
## $ pop_dens         <fct> "[   50,  100)", "[   50,  100)", "[   50,  100)", "[…
## $ pop_dens4        <fct> "[ 45,  118)", "[ 45,  118)", "[ 45,  118)", "[118,71…
## $ pop_dens6        <fct> "[ 82,  215)", "[ 82,  215)", "[ 82,  215)", "[ 82,  …
## $ pct_black        <fct> "[10.0,15.0)", "[25.0,50.0)", "[15.0,25.0)", "[ 5.0,1…
## $ pop              <int> 318857056, 4849377, 55395, 200111, 26887, 22506, 5771…
## $ female           <dbl> 50.8, 51.5, 51.5, 51.2, 46.5, 46.0, 50.6, 45.2, 53.4,…
## $ white            <dbl> 77.7, 69.8, 78.1, 87.3, 50.2, 76.3, 96.0, 27.2, 54.3,…
## $ black            <dbl> 13.2, 26.6, 18.4, 9.5, 47.6, 22.1, 1.8, 69.9, 43.6, 2…
## $ travel_time      <dbl> 25.5, 24.2, 26.2, 25.9, 24.6, 27.6, 33.9, 26.9, 24.0,…
## $ land_area        <dbl> 3531905.43, 50645.33, 594.44, 1589.78, 884.88, 622.58…
## $ hh_income        <int> 53046, 43253, 53682, 50221, 32911, 36447, 44145, 3203…
## $ su_gun4          <fct> NA, NA, "[11,54]", "[11,54]", "[ 5, 8)", "[11,54]", "…
## $ su_gun6          <fct> NA, NA, "[10,12)", "[10,12)", "[ 7, 8)", "[10,12)", "…
## $ fips             <dbl> 0, 1000, 1001, 1003, 1005, 1007, 1009, 1011, 1013, 10…
## $ votes_dem_2016   <int> NA, NA, 5908, 18409, 4848, 1874, 2150, 3530, 3716, 13…
## $ votes_gop_2016   <int> NA, NA, 18110, 72780, 5431, 6733, 22808, 1139, 4891, …
## $ total_votes_2016 <int> NA, NA, 24661, 94090, 10390, 8748, 25384, 4701, 8685,…
## $ per_dem_2016     <dbl> NA, NA, 0.23956855, 0.19565310, 0.46660250, 0.2142203…
## $ per_gop_2016     <dbl> NA, NA, 0.7343579, 0.7735147, 0.5227141, 0.7696616, 0…
## $ diff_2016        <int> NA, NA, 12202, 54371, 583, 4859, 20658, 2391, 1175, 1…
## $ per_dem_2012     <dbl> NA, NA, 0.2657577, 0.2156657, 0.5125229, 0.2621857, 0…
## $ per_gop_2012     <dbl> NA, NA, 0.7263374, 0.7738975, 0.4833755, 0.7306638, 0…
## $ diff_2012        <int> NA, NA, 11012, 47443, 334, 3931, 17780, 2808, 714, 14…
## $ winner           <chr> NA, NA, "Trump", "Trump", "Trump", "Trump", "Trump", …
## $ partywinner16    <chr> NA, NA, "Republican", "Republican", "Republican", "Re…
## $ winner12         <chr> NA, NA, "Romney", "Romney", "Obama", "Romney", "Romne…
## $ partywinner12    <chr> NA, NA, "Republican", "Republican", "Democrat", "Repu…
## $ flipped          <chr> NA, NA, "No", "No", "Yes", "No", "No", "No", "No", "N…
county_data %>%
    select(id, name, state, pop_dens, pct_black) %>%
    sample_n(5)
##      id            name state      pop_dens   pct_black
## 1 48507   Zavala County    TX [    0,   10) [ 0.0, 2.0)
## 2 08013  Boulder County    CO [  100,  500) [ 0.0, 2.0)
## 3 21077 Gallatin County    KY [   50,  100) [ 0.0, 2.0)
## 4 36015  Chemung County    NY [  100,  500) [ 5.0,10.0)
## 5 04019     Pima County    AZ [  100,  500) [ 2.0, 5.0)

We merge the data frames using the shared FIPS id column:

county_full <- left_join(county_map, county_data, by = "id")

With the data merged, we can map the population density per square mile.

p <- ggplot(data = county_full,
            mapping = aes(x = long, y = lat,
                          fill = pop_dens, 
                          group = group))

p1 <- p + geom_polygon(color = "gray90", size = 0.05) + coord_equal()

p2 <- p1 + scale_fill_brewer(palette="Blues",
                             labels = c("0-10", "10-50", "50-100", "100-500",
                                        "500-1,000", "1,000-5,000", ">5,000"))

p2 + labs(fill = "Population per\nsquare mile") +
  theme_map() +
  guides(fill = guide_legend(nrow = 1)) + 
  theme(legend.position = "bottom")

If you try out the p1 object you will see that ggplot produces a legible map, but by default it chooses an unordered categorical layout. This is because the pop_dens variable is not ordered. We could recode it so that R is aware of the ordering. Alternatively, we can manually supply the right sort of scale using the scale_fill_brewer() function, together with a nicer set of labels. We will learn more about this scale function in the next chapter. We also tweak how the legend is drawn using the guides() function to make sure each element of the key appears on the same row. Again, we will see this use of guides() in more detail in the next chapter. The use of coord_equal() makes sure that the relative scale of our map does not change even if we alter the overall dimensions of the plot.

We can now do exactly the same thing for our map of percent Black population by county. Once again, we specify a palette for the fill mapping using scale_fill_brewer(), this time choosing a different range of hues for the map.

p <- ggplot(data = county_full,
            mapping = aes(x = long, y = lat, fill = pct_black, 
                          group = group))
p1 <- p + geom_polygon(color = "gray90", size = 0.05) + coord_equal()
p2 <- p1 + scale_fill_brewer(palette="Greens")

p2 + labs(fill = "US Population, Percent Black") +
    guides(fill = guide_legend(nrow = 1)) + 
    theme_map() + theme(legend.position = "bottom")

What is your data?

Even if our data is grouped into spatial units, it is always worth asking whether a map is the best way to present it.

Let’s take our state-level opiates data and redraw it as a time-series plot.

setwd("~/Documents/ibs_course/BUS240/data")
load('opiates.rda')
head(opiates, 5)
## # A tibble: 5 × 11
##    year state     fips deaths popul…¹ crude adjus…² adjus…³ region abbr  divis…⁴
##   <int> <chr>    <int>  <int>   <int> <dbl>   <dbl>   <dbl> <ord>  <chr> <chr>  
## 1  1999 Alabama      1     37  4.43e6   0.8     0.8     0.1 South  AL    East S…
## 2  1999 Alaska       2     27  6.25e5   4.3     4       0.8 West   AK    Pacific
## 3  1999 Arizona      4    229  5.02e6   4.6     4.7     0.3 West   AZ    Mounta…
## 4  1999 Arkansas     5     28  2.65e6   1.1     1.1     0.2 South  AR    West S…
## 5  1999 Califor…     6   1474  3.35e7   4.4     4.5     0.1 West   CA    Pacific
## # … with abbreviated variable names ¹​population, ²​adjusted, ³​adjusted_se,
## #   ⁴​division_name

Here are the list of variables:

year. Year
state. State name.
fips. State FIPS code.
deaths. Number of opiate-related deaths.
population. Population.
crude. Crude death rate.
adjusted. Adjusted death rate.
adjusted.se. Standard error of Adjusted death rate.
region. Census region. (Stored as an ordered factor.)
abbr. Abbreviated state name.
division_name. Census Division. (Character.)

We could just plot the trends for every state, as we did at the very beginning with the gapminder data.

p <- ggplot(data = opiates,
            mapping = aes(x = year, y = adjusted,
                          group = state))
p + geom_line(color = "gray70") 
## Warning: Removed 17 row(s) containing missing values (geom_path).

But fifty states is too many lines to keep track of at once.

library(ggrepel)
p0 <- ggplot(data = drop_na(opiates, division_name),
            mapping = aes(x = year, y = adjusted))
            
p1 <- p0 + geom_line(color = "gray70", 
              mapping = aes(group = state)) 

p2 <- p1 + geom_smooth(mapping = aes(group = division_name),
                       se = FALSE)

p3 <- p2 + geom_text_repel(data = subset(opiates,
                                         year == max(year) & abbr !="DC"),
                     mapping = aes(x = year, y = adjusted, label = abbr),
                     size = 1.8, segment.color = NA, nudge_x = 30) +
            coord_cartesian(c(min(opiates$year),max(opiates$year)))
p3 + labs(x = "", y = "Rate per 100,000 population",
       title = "State-Level Opiate Death Rates by Census Division, 1999-2014") +
    facet_wrap(~ reorder(division_name, -adjusted, na.rm = TRUE), nrow  = 3)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: Removed 27 rows containing non-finite values (stat_smooth).
## Warning: Removed 17 row(s) containing missing values (geom_path).